Model Selection

Visual Text Generation

# Visual Text Generation

Wan2.1 T2V 1.3B GGUF

Direct GGUF conversion version of Wan2.1-T2V-1.3B, suitable for text-to-video generation tasks on consumer-grade GPUs

Text-to-Video English

samuelchristlie

Gemma 3 12b It Qat Autoawq

Gemma 3 is Google's lightweight open model series based on Gemini technology, supporting multimodal input and text output.

A fine-tuned multimodal model based on unsloth/Llama-3.2-11B-Vision-Instruct, optimized for vision-language tasks and enhanced instruction-following capabilities, achieving 2x training acceleration through the Unsloth framework

Transformers English

Erax VL 7B V1.5 GGUF

Quantized version of EraX-VL-7B-V1.5, supporting Vietnamese, English, and Chinese, suitable for tasks like insurance and OCR.

Image-to-Text Supports Multiple Languages

Donut Base Finetuned Zhtrainticket

Donut model fine-tuned on ZhTrainTicket for document image-to-text conversion without OCR processing.

Donut Base Finetuned Cord V2

Donut is an OCR-free document understanding Transformer model composed of a visual encoder (Swin Transformer) and a text decoder (BART), capable of directly extracting text information from images.

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase